开跑 train.py!喜提错误:UnicodeDecodeError: 'gbk' codec can't decode byte 0xbf in position 2: illegal multibyte sequence!在 dataset.py 中的第 382 行 with open(self.gt_files[index], 'r') as f: 改成 with open(self.gt_files[index], 'r', encoding='utf-8') as f: 填之。
开跑 train.py!喜提错误:torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.50 GiB (GPU 0; 8.00 GiB total capacity; 3.14 GiB already allocated; 2.79 GiB free; 3.15 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF!在train.py 里把 batch_size = 24 改成 batch_size = 4 填之。
开跑 train.py!能跑了!
WSL2
装好环境
conda create -n EAST python=3.7 conda activate EAST pip install shapely pip install opencv-python==4.0.0.21 pip install lanms-proper
开跑!
python3 train.py
喜提错误:
File "/home/gz/anaconda3/envs/EAST/lib/python3.7/site-packages/cv2/__init__.py", line 3, in <module> from .cv2 import * ImportError: libSM.so.6: cannot open shared object file: No such file or directory
填:
sudo apt update sudo apt install libsm6
喜提错误:
Could not load library libcudnn_cnn_infer.so.8. Error: libcuda.so: cannot open shared object file: No such file or directory Please make sure libcudnn_cnn_infer.so.8 is in your library path!
安装 CUDNN:
sudo apt install nvidia-cuda-toolkit
开跑!
/home/gz/anaconda3/envs/EAST/lib/python3.7/site-packages/torch/optim/lr_scheduler.py:143: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning) /home/gz/anaconda3/envs/EAST/lib/python3.7/site-packages/shapely/set_operations.py:133: RuntimeWarning: invalid value encountered in intersection return lib.intersection(a, b, **kwargs) /home/gz/anaconda3/envs/EAST/lib/python3.7/site-packages/shapely/set_operations.py:133: RuntimeWarning: invalid value encountered in intersection return lib.intersection(a, b, **kwargs) /home/gz/anaconda3/envs/EAST/lib/python3.7/site-packages/shapely/set_operations.py:133: RuntimeWarning: invalid value encountered in intersection return lib.intersection(a, b, **kwargs) /home/gz/anaconda3/envs/EAST/lib/python3.7/site-packages/shapely/set_operations.py:133: RuntimeWarning: invalid value encountered in intersection return lib.intersection(a, b, **kwargs) classify loss is 0.98071122, angle loss is 0.68633509, iou loss is 5.08373260 Epoch is [1/600], mini-batch is [1/250], time consumption is 8.06183171, batch_loss is 12.92779446 classify loss is 0.99145019, angle loss is 0.75015461, iou loss is 4.81786251 Epoch is [1/600], mini-batch is [2/250], time consumption is 0.21901011, batch_loss is 13.31085873 classify loss is 0.99974638, angle loss is 0.74429435, iou loss is 5.48675823 Epoch is [1/600], mini-batch is [3/250], time consumption is 0.21214652, batch_loss is 13.92944813 classify loss is 0.99397326, angle loss is 0.60727608, iou loss is 3.27876091 Epoch is [1/600], mini-batch is [4/250], time consumption is 0.22212124, batch_loss is 10.34549522 classify loss is 0.99331516, angle loss is 0.67070889, iou loss is 3.67775035 Epoch is [1/600], mini-batch is [5/250], time consumption is 0.23853326, batch_loss is 11.37815380 classify loss is 0.98511696, angle loss is 0.73328424, iou loss is 3.17167139 Epoch is [1/600], mini-batch is [6/250], time consumption is 0.20371103, batch_loss is 11.48963070 classify loss is 0.99793059, angle loss is 0.60213274, iou loss is 4.67736626 ...
MindSpore
读代码
train.py
好像跟其他的train.py差不多,设置完各种参数然后加载模型和优化器,开跑!
各种细节都在src/里。
from src.util import AverageMeter, get_param_groups from src.east import EAST, EastWithLossCell from src.logger import get_logger from src.initializer import default_recurisive_init from src.dataset import create_east_dataset from src.lr_scheduler import get_lr
defconstruct(self, out): """ Construction of VGG """ f_0 = out out = self.cast(out, mstype.float32) out = self.conv1_1(out) out = self.relu(out) out = self.conv1_2(out) out = self.relu(out) out = self.max_pool(out)
……
out = self.max_pool(out) f_4 = out out = self.conv5_1(out) out = self.relu(out) out = self.conv5_2(out) out = self.relu(out) out = self.conv5_3(out) out = self.relu(out) out = self.max_pool(out) f_5 = out
return f_0, f_2, f_3, f_4, f_5
class Merge
P是MindSpore中的一个模块,代表了运算符(operators)。我们可以通过import mindspore.ops as P来引入这个模块,从而使用其中定义的各种运算符函数,例如上述代码中使用的Concat()和ResizeBilinear()函数。
out = P.ResizeBilinear((img_hight / 16, img_width / 16), True)(f4) out = self.concat((out, f3)) out = self.relu1(self.bn1(self.conv1(out))) out = self.relu2(self.bn2(self.conv2(out)))
out = P.ResizeBilinear((img_hight / 8, img_width / 8), True)(out) out = self.concat((out, f2)) out = self.relu3(self.bn3(self.conv3(out))) out = self.relu4(self.bn4(self.conv4(out)))
out = P.ResizeBilinear((img_hight / 4, img_width / 4), True)(out) out = self.concat((out, f1)) out = self.relu5(self.bn5(self.conv5(out))) out = self.relu6(self.bn6(self.conv6(out)))
out = self.relu7(self.bn7(self.conv7(out))) return out
defextract_vertices(lines): '''extract vertices info from txt lines Input: lines : list of string info 输入是一个字符串列表lines,其中每个字符串包含了一个文本区域的信息,包括顶点坐标和标签等 Output: vertices: vertices of text regions <numpy.ndarray, (n,8)> 所有文本区域的顶点坐标 labels : 1->valid, 0->ignore, <numpy.ndarray, (n,)> 标签 ''' labels = [] # 存储最终的标签 vertices = [] # 存储顶点信息 for line in lines: # 通过rstrip()和lstrip()函数去除其前后空格和BOM(Byte Order Mark)等特殊字符,并使用split()函数将其切分为一个包含八个整数的列表 vertices.append(list(map(int, line.rstrip('\n').lstrip('\ufeff').split(',')[:8]))) label = 0if'###'in line else1 labels.append(label) # 返回顶点和标签的numpy数组 return np.array(vertices), np.array(labels)
adjust_height()
defadjust_height(img, vertices, ratio=0.2): '''adjust height of image to aug data Input: img : PIL Image vertices : vertices of text regions <numpy.ndarray, (n,8)> ratio : height changes in [0.8, 1.2] Output: img : adjusted PIL Image new_vertices: adjusted vertices ''' ratio_h = 1 + ratio * (np.random.rand() * 2 - 1) # 随机调整输入图像的高度 old_h = img.height # 根据输入的高度缩放比例ratio_h,计算调整后的图像新高度new_h。 # 原始图像的高度由变量old_h指定,通过乘以缩放比例并四舍五入取整来得到调整后的高度。 # np.around()函数是NumPy库中的一个函数,用于对数组进行四舍五入,其默认精度为0 new_h = int(np.around(old_h * ratio_h)) img = img.resize((img.width, new_h), Image.BILINEAR)
defcrop_img(img, vertices, labels, length): '''crop img patches to obtain batch and augment Input: img : PIL Image vertices : vertices of text regions <numpy.ndarray, (n,8)> labels : 1->valid, 0->ignore, <numpy.ndarray, (n,)> length : length of cropped image region Output: region : cropped image region new_vertices: new vertices in cropped region ''' # 获取原始图像的高度h和宽度w h, w = img.height, img.width # confirm the shortest side of image >= length # 如果其中较小的一边小于指定的裁剪长度,则使用PIL库提供的resize()方法将图像缩放到相应的大小 if h >= w and w < length: img = img.resize((length, int(h * length / w)), Image.BILINEAR) elif h < w and h < length: img = img.resize((int(w * length / h), length), Image.BILINEAR) ratio_w = img.width / w ratio_h = img.height / h assert (ratio_w >= 1and ratio_h >= 1)
defdetect_dataset(model, test_img_path, submit_path): """ detection on whole dataset, save .txt results in submit_path Input: model : detection model 模型实例 device : gpu if gpu is available test_img_path: dataset path 测试图片所在文件夹的路径 submit_path : submit result for evaluation 提交结果保存路径 """ # 读取测试集中所有的图片,并按照文件名排序 img_files = os.listdir(test_img_path) img_files = sorted([os.path.join(test_img_path, img_file) for img_file in img_files])
for i, img_file inenumerate(img_files): # 对于每一张图片,调用detect()函数进行目标检测,返回目标框的坐标信息 print('evaluating {} image'.format(i), end='\r') boxes = detect(Image.open(img_file), model) seq = [] if boxes isnotNone: # 如果检测结果不为空,则将框的坐标信息转换成符合要求的字符串序列并加入到列表seq中 seq.extend([','.join([str(int(b)) for b in box[:-1]]) + '\n'for box in boxes]) # 将序列seq保存为与当前图片名称相同的.txt文件格式,并将其写入submit_path目录下 withopen(os.path.join(submit_path, 'res_' + os.path.basename(img_file).replace('.jpg', '.txt')), 'w') as f: f.writelines(seq) # 当检测完成后,输出log信息提示检测进度
detect()
defdetect(img, model): """detect text regions of img using model Input: img : PIL Image model : detection model device: gpu if gpu is available Output: detected polys """ # 将输入图片进行尺寸调整与相应的ratio变换,得到调整后的图片、高宽比例ratio_h和ratio_w img, ratio_h, ratio_w = resize_img(img) # 利用模型对调整后的图片进行文字区域检测,得到概率图score和文本框参数geo score, geo = model(load_pil(img)) # 对概率图和文本框参数使用PaddlePaddle中的Squeeze()函数进行维度降低(由4维转为3维) score = P.Squeeze(0)(score) geo = P.Squeeze(0)(geo) # 从降维后的概率图和文本框参数中获取文本框坐标信息,即调用get_boxes()函数 boxes = get_boxes(score.asnumpy(), geo.asnumpy()) # 根据之前的高宽比例ratio_h和ratio_w,调整并计算出检测到的文本框在原始图片上的坐标信息,即调用adjust_ratio()函数 return adjust_ratio(boxes, ratio_w, ratio_h)
get_boxes()
defget_boxes(score, geo, score_thresh=0.9, nms_thresh=0.2): """get boxes from feature map Input: score : score map from model <numpy.ndarray, (1,row,col)> 概率图 geo : geo map from model <numpy.ndarray, (5,row,col)> 文本框参数 score_thresh: threshold to segment score map 置信度阈值 nms_thresh : threshold in nms 非极大值抑制阈值 Output: boxes : final polys <numpy.ndarray, (n,9)> """ # 对输入的score进行降维,即将其转化为二维数组 score = score[0, :, :] # 在降维后的score数组中,找到大于score_thresh的点,并以(r,c)的格式记录下来,形成一个n x 2的矩阵xy_text xy_text = np.argwhere(score > score_thresh) # n x 2, format is [r, c] # 按行排序xy_text,以保证前面的点在结果中优先考虑 if xy_text.size == 0: returnNone
# 将xy_text中的坐标信息转化为正确的x,y坐标(由于降维之前是按行major的顺序排列,因此需要将列号作为x坐标,行号作为y坐标) xy_text = xy_text[np.argsort(xy_text[:, 0])] valid_pos = xy_text[:, ::-1].copy() # n x 2, [x, y] # 从降维后的geo数组中提取出与xy_text中相应位置点相关的文本框参数,形成5 x n的矩阵valid_geo valid_geo = geo[:, xy_text[:, 0], xy_text[:, 1]] # 5 x n # 利用restore_polys()函数将valid_pos和valid_geo还原为文本框的坐标点集polys_restored,并得到对应的索引值index polys_restored, index = restore_polys(valid_pos, valid_geo, score.shape) if polys_restored.size == 0: returnNone
In this project, the file organization is recommended as below:
. └─data ├─icdar2015 ├─Training # Training set ├─image # Images in training set ├─groundTruth # GT in training set └─Test # Test set ├─image # Images in training set ├─groundTruth # GT in training set
安装环境一条龙!requirements.txt 里面的玩意着实难装,还是手动装好了……
source activate base # 第一次进服务器激活需要 activate base python -c "import mindspore;mindspore.run_check()" # 查看 mindspore 版本 conda create -n east --clone base # 克隆 base 环境 conda activate east # 激活 east 环境 pip install numpy pip install opencv-python pip install shapely pip install pillow pip install lanms-neo pip install --upgrade setuptools # 更新 setuptools pip install Polygon3 # 这个库很难装,可能需要更新 setuptools pip install onnxruntime